30 research outputs found
Plagiarism Detection in arXiv
We describe a large-scale application of methods for finding plagiarism in
research document collections. The methods are applied to a collection of
284,834 documents collected by arXiv.org over a 14 year period, covering a few
different research disciplines. The methodology efficiently detects a variety
of problematic author behaviors, and heuristics are developed to reduce the
number of false positives. The methods are also efficient enough to implement
as a real-time submission screen for a collection many times larger.Comment: Sixth International Conference on Data Mining (ICDM'06), Dec 200
Athletes’ Relationships with Training Scale (ART)
The Athletes’ Relationships with Training Scale (ART)* is a self-report measure of unhealthy training behaviors and beliefs in athletes. The ART was designed for use by clinicians and athletic trainers to help identify athletes who are engaging in unhealthy training practices which could be associated with an eating disorder. The ART may also be helpful for tracking clinical outcomes in athletes with eating disorders who are receiving treatment. This record contains the 15-item ART as well as scoring instructions and guidelines for interpreting total scores
Modeling Additive Structure and Detecting Interactions with Groves of Trees
Discovery of additive structure is an important step towards understanding a complex multi-dimensional function, because it allows for expressing this function as the sum of lower-dimensional or otherwise simpler components. Modeling additive structure also opens up opportunities for learning better regression models.
The term statistical interaction is used to describe the presence of non-additive effects among two or more variables in a function. When variables interact, their effects must be modeled and interpreted simultaneously. Thus, detecting statistical interactions can be critical for an understanding of processes by domain researchers.
This dissertation analyzes benefits of modelling additive structure for prediction and interaction detection problems. It describes a new learning algorithm called Groves, which is an ensemble of additive regression trees. Groves is based on such existing techniques as bagging and additive models; their combination allows us to use large trees in the ensemble and at the same time model additive structure of the response function. Regression version of the algorithm, Additive Groves, and its classification counterpart, Gradient Groves, yield consistently high performance across a variety of problems, outperforming on average a large number of other algorithms.
Additive nature of Groves makes it particularly useful for interaction detection. This dissertation introduces a new approach to interaction detection: it is based on comparing the performance of restricted and unrestricted predictive models. Groves of trees allow variable interactions to be carefully controlled and therefore are especially useful for this framework.
The details of proposed practical approach to interaction detection analysis are demonstrated on real data describing the abundance of different species of birds in the prairies east of the southern Rocky Mountains